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ON '. Abstract 

O ■ 

We show how to use the RSA one-way accumulator to realize an efficient and dynamic au- 
thenticated dictionary, where untrusted directories provide cryptographically verifiable answers 
to membership queries on a set maintained by a trusted source. Our accumulator-based scheme 
for authenticated dictionaries supports efficient incremental updates of the underlying set by 
insertions and deletions of elements. Also, the user can optimally verify in constant time the 
authenticity of the answer provided by a directory with a simple and practical algorithm. In 
particular, we show how to perform updates and queries in 0(n 1 ^ 2 ) time while keeping the 
constant-time verification algorithm exactly the same as in previous inefficient schemes. In ad- 
. dition, at the expense of slightly increasing the conceptual complexity of the verification, we 

show that there is an accumulator-based approach to the authenticated dictionary problem that 
achieves 0(n e )-time performance for updates and queries, while keeping 0(1) verification time, 
where e is any fixed constant such that e > 0. We have also implemented this scheme and we 
give empirical results that can be used to determine the best strategy for systems implemen- 
tation with respect to resources that are available. This work has applications to certificate 
revocation in public key infrastructure and end-to-end integrity of data collections published by 
(~«». ' third parties on the Internet. 

o ' 
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^ ■ 1 Introduction 



Modern distributed transactions often operate in an asymmetric computational environment. Typi- 
^ ■ cally, client applications are deployed on small devices, such as laptop computers and palm devices, 

whereas the server side of these applications are often deployed on large-scale multiprocessors. 
Moreover, several client applications communicate with powerful server farms over wireless connec- 
tions or slow modem-speed connections. Thus, distributed applications are facilitated by solutions 
that involve small amounts of computing and communication on the client side, without overly 
burdening the more-powerful server side of these same applications. The challenge we address in 
this paper is how to incorporate added levels of information assurance and security into such ap- 
plications without significantly increasing the amount of computation and communication that is 
needed at the client (while at the same time keeping the computations on the servers reasonable). 

A major aspect of our approach to this challenge is to replicate the computations of servers 
throughout mirror sites in the network, so as to reduce the network latency experienced by users 
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in their client applications. This approach is used, for example, by Akamai Technologies to push 
images and other content to web servers that are close to client browsers. Thus, a user will in 
general be much closer to one of these mirror sites than to the source of the service, and will 
therefore experience a faster response time from a mirror than it would by communicating directly 
with the source. In addition, by off-loading user servicing from the source, this distributed scheme 
protects the source from denial-of-service attacks and allows for load balancing across the mirror 
sites, which further improves performance. Indeed, for the scope of this paper we are interested in 
supporting applications where clients can avoid online contact with the source. 

An information security problem arising in the replication of data to mirror sites is the authen- 
tication of the information provided by the sites. Indeed, there are applications where the user may 
require that data coming from a mirror site be cryptographically validated as being as genuine as 
they would be had the response come directly from the source. For example, a financial speculator 
that receives NASDAQ stock quotes from the Yahoo! Finance Web site would be well advised to 
get a proof of the authenticity of the data before making a large trade. 

For all data replication applications, and particularly for e-commerce applications in wireless 
computing, we desire solutions that involve short responses from a mirror site that can be quickly 
verified with low computational overhead. 

1.1 Problem Definition 

More formally, the problem we address involves three groups of related parties: trusted information 
sources, untrusted directories, and users. An information source defines a finite set S of elements 
that evolves over time through insertions and deletions of items. Directories maintain copies of the 
set S. Each directory storing S receives time-stamped updates from the source for S together with 
update authentication information, such as signed statements about the update and the current 
elements of the set. A user performs membership queries on the set S of the type "is element 
e in set ST' but instead of contacting the source for S directly, it queries a directory for S 
instead. The contacted directory provides the user with a response to the query together with query 
authentication information, which yields a proof of the answer assembled by combining statements 
signed by the source. The user then verifies the proof by relying solely on its trust in the source and 
the availability of public information about the source that allows to check the source's signature. 
The data structures used by the source directory to maintain set S, together with the protocol for 
queries and updates is called an authenticated dictionary [391] • Figure Q] shows a schematic view of 
an authenticated dictionary. 
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Figure 1: Authenticated dictionary. 

The design of an authenticated dictionary should address several goals. These goals include low 
computational cost, so that the computations performed internally by each entity (source, directory, 
and user) should be simple and fast, and low communication overhead, so that bandwidth utilization 
is minimized. Since these goals are particularly important for the user, we say that an authenticated 
dictionary is size- oblivious if the query authentication information size and the verification time do 
not depend on the number of items in the dictionary. Size-oblivious solutions to the authenticated 
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dictionary problem are ideally suited for wireless e-commerce applications, where user devices have 
low computational power and low bandwidth. In addition, size-oblivious solutions add an extra 
level of security, since the size of the dictionary is never revealed to users. 



1.2 Applications 

Authenticated dictionaries have a number of applications. One such application is in third-party 
data publication on the Internet fliil ]. where third parties publish critical information, catalog 
entries, and design specifications for content providers who wish to outsource the business of pub- 
lishing this information and processing transactions involving it. 

In this case, the players in our framework are as follows: the source is a trusted organization 
(e.g., a stock exchange) that produces and maintains integrity-critical content (e.g., stock prices) 
and allows third parties (e.g., Web portals), to publish this content on the Internet so that it is 
widely disseminated. The publishers store copies of the content produced by the source and process 
queries on such content made by the users. In addition to returning the result of a query, a publisher 
also returns a proof of authenticity of the result, thus providing a validation service. Publishers 
also perform content updates originating from the source. Even so, the publishers provide this 
added value and are able to charge for it without the added cost of deploying all the mirror sites 
in high-security firewall-protected environments. Indeed, the publishers are not assumed to be 
trustworthy, for a given publisher may be processing updates from the source incorrectly or it may 
be the victim of a system break- in. 

Another application of the authenticated dictionary is in certificate revocation [H, [H, 22, 27, 29, 



38, where the source is a certification authority (CA) that digitally signs certificates binding 
entities to their public keys, thus guaranteeing their validity. These certificates are then used 
to authorize secure socket layer (SSL) connections to e-stores and business-to-business exchanges. 
Nevertheless, certificates are sometimes revoked (e.g., if if a private key is lost or compromised, or 
if someone loses their authority to use a particular private key). Thus, the user of a certificate must 
be able to verify that a given certificate has not been revoked. 

To facilitate such queries, the set of revoked certificates is distributed to certificate revocation 
directories, which process revocation status queries on behalf of users. The results of such queries 
need to be trustworthy, for they often form the basis for electronic commerce transactions. 

Finally, we highlight how authenticated dictionaries could be used in military and research 
applications, for they could be used for the authenticated querying of information repositories, 
such as coalition documents, mission logs, genomic databases 12811 . and astrophysical databases 



(like the object catalog of the Sloan Digital Sky Survey [12l. 132. l33l|). Given the significant defense 
and scientific benefits that can result from such querying, users need to be certain that the results 
of their queries are accurate and current. 

1.3 Previous and Related Work 

Authenticated dictionaries are related to research in distributed computing (e.g ., data replication 
in a networ k (3 , EH), data structure design (e.g., program checking jp. Eol. 11, and memory 
checking (9I. I2Q]). and cryptography (e.g., incremental cryptography [3.1a, l20l. 2J|). 

Previous additional work on authenticated dictionaries has been conducted primarily in the 
context of certificate revocation. The traditional method for certificate revocation (e.g., see (29| ) 
is for the CA (source) to sign a statement consisting of a timestamp plus a hash of the set of all 
revoked certificates, called certificate revocation list (CRL), and periodically send the signed CRL 
to the directories. This approach is secure, but it is inefficient, for it requires the transmission of the 



3 



entire set of revoked certificates for both source-to-directory and directory-to-user communication. 
Thus, this solution is clearly not size-oblivious, and even more recent modifications of this solution, 
which are based on delta-CRLs 17(, are not size-oblivious. 



Micali 38] proposes an alternate approach, where the source periodically sends to each directory 
the list of all issued certificates, each tagged with the signed timestamped value of a one-way hash 
function (e.g., see 42] ) that indicates if this certificate has been revoked or not. This approach 
allows the system to reduce the size of the query authentication information to O(l) words: namely 
just a certificate identifier and a hash value indicating its status. Unfortunately, this scheme requires 
the size of the update authentication information to increase to 0(JV), where ./V is the number of 
all nonexpired certificates issued by the certifying authority, which is typically much larger than 
the number n of revoked certificates. It is size-oblivious for immediate queries, but cannot be used 
for time stamping for archiving purposes, since no digest of the collection is ever made. 



The hash tree scheme introduced by Merkle [3a, |37| can be used to implement a static au- 
thenticated dictionary, which supports the initial construction of the data structure followed by 
query operations, but not update operations. A hash tree T for a set S stores the elements of S 
at the leaves of T and a hash value h(v) at each node v, which combines the hash of its children. 
The authenticated dictionary for S consists of the hash tree T plus the signature of a statement 
consisting of a timestamp and the value h(r) stored of the root r of T. An element e is proven to 
belong to S by reporting the values stored at the nodes on the path in T from the node storing 
e to the root, together with the values of all nodes that have siblings on this path. This solution 
is not size-oblivious, since the length of this path depends on n. Kocher [30] also advocates a 
static hash tree approach for realizing an authenticated dictionary, but simplifies somewhat the 
processing done by the user to validate that an item is not in the set S, by storing intervals instead 
of individual elements. Other certificate revocation schemes, based on variations of cryptographic 
hashing, have been recently proposed in 131. |22|. but like the static hash tree, these schemes have 
logarithmic verification time. 

Using techniques from incremental cryptography, Naor and Nissim [39] dynamize hash trees 
to support the insertion and deletion of elements. In their scheme, the source and the directory 
maintain identically-implemented 2-3 trees. Each leaf of such a 2-3 tree T stores an element of 
set S, and each internal node stores a one-way hash of its children's values. Hence, the source- 
to-directory communication is reduced to 0(1) items, but the directory-to-user communication 
remains at O(logra), where n is the size of set S. Hence, their solution is also not size-oblivious. 

Goodrich and Tamassia 24| have devised a data structure for an authenticated dictionary based 
on skip lists [401]. This data structure matches the asymptotic performance of the Naor-Nissim 
approach [39] ] , while simplifying the details of an actual implementation of a dynamic authenticated 
dictionary. 

Goodrich, Tamassia and Schwerin 26j present the software architecture and implementation 
of an authenticated dictionary based on the above approach, and Anagnostopoulos, Goodrich and 
Tamassia j3] introduce the notion of persistent authenticated dictionaries, where user can issue 
historical queries of the type, "was element e in set S at time t?" 

Martel et al. [34] introduce a general approach for the design of authenticated data structures. 
They consider the class of data structures such that the (i) links of the structure form a directed 
acyclic graph G of bounded degree and with a single source node; and (ii) queries on the data 
structure correspond to a traversal of a subdigraph of G starting at the source. They show that 
such data structures can be authenticated by means of hashing scheme that digests the entire 
digraph G into a hash value at its source. With this scheme, the sizes of the answer authentication 
information and the verification time are proportional to the size of the subdigraph traversed. 
Thus, their approach is not size-oblivious. Along these same lines, Cohen, Goodrich, Tamassia and 
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Triandopoulos [16| show how to efficiently authenticate data structures for fundamental problems 
on networks, such as path and connectivity queries, and on geometric objects, such as intersection 
and containment queries. Their algorithms are more general than those of Martel et ah, but they 
still are not size-oblivious. 

Independent of the preliminary announcement of the current paper [25[ , Camenisch and Lysyan- 
skaya independently investigate dynamic accumulators. They give a zero-knowledge protocol 
and a proof that a committed value is in the accumulator with respect to the Pedersen commitment 
scheme. They also present applications to revocation for group signature, identity escrow schemes 
and anonymous credentials systems. They do not achieve the kinds of performance tradeoffs we 
achieve in this paper. 

1.4 Our Results 

In this paper we present a number of size-oblivious solutions to the authenticated dictionary prob- 
lem. The general approach we follow here is to abandon the approach of the previous methods 
cited above that are based on applying one-way hash functions to nodes in a data structure. In- 
stead, we make use of RSA one-way accumulators, as advocated by Benaloh and de Mare 0]. 
Such an approach is immediately size-oblivious, but there is an additional challenge that has to 
be overcome to make this approach practical. The computations needed at the source and/or di- 
rectories in a straightforward implementation of the Benaloh-de Mare scheme are inefficient. Our 
main contribution, therefore, is a mechanism to make the computations at the source and mirrors 
efficient. 

The rest of this paper is organized as follows. In Section [2] we review the RSA accumulator [6j] 
and other concepts used in our approach. We also present some basic tools that are used in the 
rest of the paper, including a description of a straightforward application of the RSA accumulator 
to the authenticated dictionary problem. We describe an improvement of this scheme that gives 
constant query and verification times but linear update time in Sectional This improvement, called 
precomputed accumulations, consists of an efficient precomputation by the source of auxiliary data 
used by the directories to speed-up query processing. In SectionlU we present our complete solution, 
while preserving constant verification time by the user. For example, we can balance the two times 
and achieve 0(y/n) query and update time and O(l) verification time, where n is the current number 
of elements. An alternative solution is presented in Section [5l where we present a parameterized 
accumulations scheme. This scheme, suitable for large data sets, achieves 0(n e ) query and update 
time and O(l) verification time, where e is any fixed constant such that e > 0. Section [6] discusses 
the security of our scheme. In Section [7] we present the performance of our implementation of the 
scheme. Finally, concluding remarks are given in Section [8l Throughout this paper, we denote with 
n the current number of elements of the set S stored in the authenticated dictionary. 

2 Preliminaries 

In this section, we discuss some preliminary concepts used in our constructions. 
2.1 The Security Model 

As discussed above, in the authenticated dictionary problem, there are three parties, source, direc- 
tory, and user, and a set S of elements that are of interest to these three parties. Specifically, the 
roles of the three parties are as follows: 
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• The source: this is a trusted entity who is the author and authenticator for set S. The 
source maintains the set S and vouches for its accuracy and content. The set S is allowed to 
change over time, with insertions adding elements to S and deletions removing items from S. 
At regular time intervals, the source communicates to the directory these changes together 
with a signed statement (s,t), which we call basis, consisting of a timestamp t and a value 
s associated with the contents of set S at time t. We denote with A the difference between 
two consecutive timestamps. Parameter A and the public key of the source are known to all 
the parties. 

• The directory: this is an untrusted entity that periodically receives from the source the 
updates on set S and the new signed basis for S. The directory also answers membership 
queries issued by a user. A membership query consists of asking whether a query item e is in 
the most recent version of set S. The directory returns to the user a response R e ("e G S" or 
"e G" S") together with a proof of the response, consisting of a verification statement C e and 
the latest signed basis (s, t) from the source. The directory must always answer each query 
in this way, but it can attempt to forge false answers and verification certificates. 

• The user: this is an entity that issues a membership query for an element e to the directory. 
After receiving the response i? e , statement C e and the signed basis (s,t), the user verifies the 
signature of the basis and then runs a verification algorithm that takes as input e, R e , C e 
and s and outputs true if and only if R e is the correct answer to the query about element e 
for the dictionary S at time t. Finally, the user determines the timeliness of the response by 
checking that the current time is between the returned timestamp t and the next timestamp 
t + A. 

In this paper, we are particularly interested in schemes for the authenticated dictionary problem 
such that the size of the verification statement and the running time of the verification algorithm 
are constant. 

2.2 Reducing Dynamic Membership Determination to Dynamic Membership 
Verification 

The authenticated dictionary problem requires the validation of two-sided answers, that is, whether 
"e G S" or "e G" 5". As observed by Kocher [3(j, any authentication scheme that can perform 
secure membership verification, that is, one that can provide verification of "e G S" statements, 
under element insertions and deletions, can be extended to provide verification of set membership 
determination. 

Let h be a collision-resistant cryptographic hash function mapping the universe of possible 
elements for set S to X-bit integers. We show that can derive a membership determination scheme 
for set S from a membership verification scheme for 2ET-bit integers. Let X = (x%, • • • , x n ) be the 
sorted sequence of hash values of the elements of S using hash function h. We define 

Y — {xo||xi, 3'l||3'2> ' ' ' %n— lll^-n) ^nll^n+lii 

where || denotes concatenation, xq = 0, and x n +i = 2 — 1. Each element yi = of Y 

represents the interval of hash values [xi, An insertion in set S corresponds to two insertions 

and one deletion in set Y. Similarly, a deletion from set S corresponds to one insertion and two 
deletions in set Y. 

Suppose we have a membership verification scheme for Y. We can build a membership deter- 
mination scheme for S as follows. To prove e G S, we return the proof of either G Y or 
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G Y, where X{ = h(e). To prove that e S, we return the proof that G Y, where 

Xi < /i(e) < Xj+i. Therefore, for the remainder of this paper, we concentrate on set membership 
verification since we can reduce set membership verification to set membership determination. 



2.3 One- Way Accumulators 

An important tool for our scheme is that of one-way RSA accumulator functions 0, [f], Qj, 2^, 41]. 



Such a function allows a source to digitally sign a collection of objects as opposed to a single object. 

The use of one-way RSA accumulators originates with Benaloh and de Mare [6]. They show 
how to utilize an RSA one-way accumulator, which is also known as an exponential accumulator, to 
summarize a collection of data so that user verification responses have constant-size. Refinements 
of the RSA accumulator used in our construction are given by Baric and Pfitzmann [3|], Gennaro, 
Halevi and Rabin j^i], and Sander, Ta-Shma and Yung 41]. 



As we show in the rest of this section, the RSA accumulator can be used to implement a static 
authenticated dictionary, where the set of elements is fixed. However, in a dynamic setting where 
items are inserted and deleted, the standard way of utilizing the RSA accumulator is inefficient. 
Several other researchers have also noted the inefficiency of this implementation in a dynamic 
setting (e.g., see [i^]). Indeed, our solution can be viewed as refuting this previous intuition to 
show that a more sophisticated utilization of the RSA accumulator can be made to be efficient even 
in a dynamic setting. 

The most common form of one-way accumulator is defined by starting with a "seed" value yo, 
which signifies the empty set, and then defining the accumulation value incrementally from yo for a 
set of values X = {x±, • • • , x n }, so that yi = x^), where / is a one-way function whose final 

value does not depend on the order of the Xj's (e.g., see 0]). In addition, one desires that y{ not 
be much larger to represent than yi-\, so that the final accumulation value, y n , is not too large. 
Because of the properties of function /, a source can digitally sign the value of y n so as to enable 
a third party to produce a short proof for any element X{ belonging to X — namely, swap Xi with 
x n and recompute y n _i from scratch — the pair (xj,y n _i) is a cryptographically-secure assertion for 
the membership of X{ in set X. 

A well-known example of a one-way accumulator function is the RSA accumulator, 

f(y,x) = y x mod N, (1) 
for suitably-chosen values of the se ed y n and modulus N [6]. In particular, choosing N = PQ with 



P and Q being two strong primes 35j] makes the RSA accumulator function as difficult to invert 
as RSA cryptography [g]. 

The difficulty in using the RSA accumulator function in the context of authenticated dictionaries 
is that it is not associative; hence, any updates to set X require significant recomputations. 



2.4 Euler's Theorem 

There is an important technicality involved with use of the RSA accumulator function, namely in 
the choice of the seed a = yo- In particular, we should choose a relatively prime with P and Q. 
This choice is dictated by Euler's Theorem, which states 

Theorem 2.1 (Euler's Theorem). a*W mod N = 1, if a > 1 and N > 1 are relatively prime. 

In our use of the RSA accumulator function, the following well-known corollary to Euler's 
Theorem will prove useful. 
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Corollary 2.2. If a > 1 and N > 1 are relatively prime, then a x mod N = a x mod mod N , 
for all x > 0. 

One implication of this corollary to the authenticated dictionary problem is that the source 
should never reveal the values of the prime numbers P and Q. Such a revelation would allow a 
directory to compute <j>(N) = {P — 1)(Q — 1), which in turn could result in a false validation at 
a compromised directory So, our approach takes care to keep the values of P and Q only at the 
source. 



2.5 Two-Universal Hash Functions 

As in previous approaches 2^, 41], we use the RSA accumulator in conjunction with two-universal 
hash functions. Such functions were first introduced by Carter and Wegman [151 ] . 

A family H = {h : A — ► B} of functions is two-universal if, for all ai,a2 G A, a± ^ 02 and for a 
randomly chosen function h from iif, 

1 

Prft, eff {/i(oi) = h(a 2 )} < r^j. 

In our scheme, the set ^4 consists of 3/c-bit vectors and the set B consists of fc-bit vectors, and we 
are interested in finding random elements in the preimage of a two-universal function mapping A 
to B. We can use the two-universal function h(x) = Ux, where U is a k x 3k binary matrix. To 
get a representation of all the solutions for h~ 1 (e), we need to solve a linear system. Once this is 
done, picking a random solution can be done by multiplying a bit matrix by a random bit vector, 
and takes 0(k 2 ) bit operations. 



2.6 Choosing a Suitable Prime 

We are interested in obtaining a prime solution of the linear system that represents a two-universal 
hash function. The following lemma, of Gennaro et al. [i3 ]. is useful in this context: 



Lemma 2.3 ([231]). Let H be a two-universal family from {0, l} 3fc to {0, l} k . Then, for all but a 
2~ k fraction of the functions h G H, for every e G {0, l} k a fraction of at least ^ of the elements 
in / _1 (e) are primes, for some small constant c. 

For reasons that will become clear in the security proof given in Section El our scheme requires 
that a prime inverse be greater then V2 3k . Also, since the domain of H is {0, l} 3fc , this prime is 
less than 2 3k . So, by the results of prime number theory, the density of big prime numbers that are 
less than 2 k is about Jfe for all but a 2~^( fc ) fraction of functions in family H. The expected number 
of steps to find a suitable prime is 0(k). In order to find a suitable prime with high probability 
1 — 2~ n ( fc ) we need to sample 0(k 2 ) times. 

Recall from Section [2.51 that picking a random solution takes 0(k 2 ) bit operations. Thus, the 
total running time of finding a suitable prime is equal to running 0(k 2 ) primality tests. 

One needs to be careful about choice of primality test because it could happen that the cost of 
prime generation and verification dominates the cost of signing. One could use the Miller-Rabin 
test, for example. To reduce the probability of mistaking a composite number for a prime one 
could perform a number of additional Miller-Rabin tests. Performing these tests could be costly. 



Fortunately, Cramer and Shoup 18|] give a fast primality testing algorithm that can be used here. 
It does additional tests between runs of Miller-Rabin algorithm that reduce the primality checking 
time. They also state that empirical runs of the algorithm indicate running times that are suitable 
for signing schemes. 
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2.7 The Strong RSA Assumption 



The proof of security of our scheme uses the strong RSA assumption, as defined by Baric and 
Pfitzmann [jj. Given iV and x £ Ztr, the strong RSA problem consists of finding integers /, with 
2 < / < N, and a, such that we have a* = x. The difference between this problem and the standard 
RSA problem is that the adversary is given the freedom to choose not only the base a but also the 
exponent /. 

Strong RSA Assumption: There exists a probabilistic algorithm B that on input l r 
outputs an RSA modulus N such that, for all probabilistic polynomial-time algorithms 
D, all c > 0, and all sufficiently large r, the probability that algorithm Dona random 
input x E Zn outputs a and / > 2 such that a* = x mod N is no more than r~ c . 

In other words, given N and a randomly chosen element x, it is infeasible to find a and / such that 
a? = x mod N. 



2.8 A Straightforward Accumulator-Based Scheme 

Let S = {e±, &2i ■ ■ ■ , e n } be the set of elements stored at the source. Each element e is represented 



by k bits. The source chooses strong primes 35] P and Q that are suitably large, e.g., P,Q > 2^ k . 
It then chooses a suitably-large base a that is relatively prime to JV = PQ. Note that N is at 
least 2 3fc . It also chooses a random hash function h from a two-universal family (as discussed in 
Section [23]) . The source broadcasts once a, N and h to the directories and users, but keeps P and Q 
secret. At periodic time intervals, for each element of S, the source computes the representative 
of e,, denoted Xi, where h(xi) = et and Xi is a prime chosen as described in Section T2.61 The source 
then combines the representatives of the elements by computing the RSA accumulation 

A a XlX2 " Xn mod N 

and broadcasts to the directories a signed message (A, t), where t is the current timestamp. 



2.8.1 Query 

When asking for a proof of membership in S of an element e^, the user submits to a directory. 
To prove that a query element is in S, a directory computes the value 

Ai <- *i*a-*i-ixi+i-x n mod jy- ( 2 ) 

That is, Ai is the accumulation of all the representatives of the elements of S besides Xi and is said 
to be the witness of e,. After computing Ai, the directory returns to the user the representative 
Xi, the witness Ai and the pair (A,t), signed by the source. Note that this query authentication 
information has constant size; hence, this scheme is size-oblivious. However, computing witness Ai 
is no trivial task for the directory, for it must perform n — 1 exponentiations to answer a query. 
Making the simplifying assumption that the number of bits needed to represent N is independent 
of n, the computation performed to answer a single query takes O(n) exponentiations. 



2.8.2 Verification 

The user checks that timestamp t is current and that (A, t) is indeed signed by the source. It then 
checks that X{ is a valid representative of e*, i.e., h[xj) = ej. Finally, it computes A 1 <— A^' mod N 
and compares it to A. If A' = A, then the user is reassured of the validity of the answer because 
of the strong RSA assumption. The verification needs only 0(1) exponentiations. 
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2.8.3 Updates 



For updates, the above simple approach has an asymmetric performance (for unrestricted values of 
accumulated elements), with insertions being much easier than deletions. To insert a new element 
e n +i into the set S, the source simply recomputes the accumulation A as follows 

A <- A Xn+1 mod N 

where x n +i is the (prime) representative of e n +i. The computation of x n +i can be done in time that 
is independent of n (see Section f2 . 6 [) . i.e., with one exponentiation. An updated signed pair [A, t) is 
then sent to the directories in the next time interval. Thus, an insertion takes 0(1) time, counting 
exponentiations and other modular arithmetic operations as constant-time operations. The deletion 
of an element e« 6 S, on the other hand, will in general require the source to recompute the new 
value A by performing n — 1 exponentiations. That is, a deletion takes O(n) time. 

The performance of the above straightforward use of the RSA accumulator is summarized in 
Table [U 



space 


insertion time 


deletion time 


update info 


query time 


query info 


verify time 


0(n) 


0(1) 


0(n) 


0(1) 


0(n) 


0(1) 


0(1) 



Table 1: Straightforward implementation of an authenticated dictionary using an RSA accumulator. 
Each of these running times count modular exponentiations and other arithmetic operations as 
constant-time operations; hence, they can also be viewed in terms of alternative bounds by using 
the above bounds as characterizing the number of such modular operations. 

Of course, if a representative X{ is relatively prime with P — 1 and Q — 1, the source can delete 
ej by computing x <— x" 1 mod (f>(N) (via the extended Euclidean algorithm) and then updating 
A <— A x mod N. But we cannot guarantee that Xi has an inverse in Z^(k\ if it is an accumulation 
of a group of elements; hence, we do not advocate using this approach for deletions. Indeed, we 
will not assume the existence of multiplicative inverses in ^(jv) f° r an y °f our solutions. Thus, we 
are stuck with linear deletion time at the source and linear query time at a directory when making 
this straightforward application of RSA accumulators to the authenticated dictionary problem. 

The above linear query time is generally considered too slow to be efficient for processing large 
numbers of queries. We describe in the next section an alternative approach that can answer queries 
much faster. 

3 Precomputed Accumulations 

We present a first improvement that allows for fast query processing. We require the directory to 
store a precomputed witness A, for each element e$ of S, as defined in Eq. [2j Thus, to answer a 
query, a directory looks up the A{ value, rather than computing it from scratch, and then completes 
the transaction as described in the previous section. Thanks to the precomputation of the witnesses 
at the source, a directory can process any query in 0(1) time (with no exponentiations) while the 
verification computation for a user remains unchanged. 

Unfortunately, a standard way of implementing this approach is inefficient for processing up- 
dates. In particular, a directory now takes O(n) exponentiations to process a single insertion, since 
it needs to update all the existing witnesses and compute a new witness from scratch, and 0(n log n) 
exponentiations to process a single deletion, for after a deletion the directory must recompute all 
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the witnesses, which can be done using the algorithm given in [41]. Thus, at first glance, this 
precomputed accumulations approach appears to be quite inefficient when updates to the set S are 
required. 

We can process updates with fewer than O(nlogn) exponentiations, however, by enlisting the 
help of the source. Our method in fact can be implemented with O(n) exponentiations by a simple 
two-phase approach. The details for the two phases follows. 

3.1 First Phase 

Let S be the set of n items stored at the source after performing all the insertions and deletions 
required in the previous time interval. We build a complete binary tree T "on top" of the repre- 
sentative values of the elements of S, so that each leaf of T is associated with the representative Xi 
of an element e« of S. In the first phase, we perform a post-order traversal of T, so that each node 
v in T is visited only after its children are visited. The main computation performed during the 
visit of a node v is to compute a value x{v). If v is a leaf of T, storing some representative Xi, then 
we compute 

x(v) <— Xi mod (j){N). 

If v is an internal node of T with children u and w (we can assume T is proper, so that each internal 
node has two children), then we compute 

x(v) <— x(u)x(w) mod <f)(N). 

When we have computed x(r), where r denotes the root of T, then we are done with this first phase. 
Since a post-order traversal takes 0(n) time, and each visit computation in our traversals takes 
O(l) time, this entire first phase runs in 0(n) time. We again make the simplifying assumption 
that the number of bits needed to represent N is independent of n. 

3.2 Second Phase 

In the second phase, we perform a pre-order traversal of T, where the visit of a node v involves the 
computation of a value A(v). The value A(v) for a node v is defined to be the accumulation of all 
values stored at nodes that are not descendents of v (including v itself if v is a leaf). Thus, if v 
is a leaf associated with the representative value Xi of some element of S, then A(v) = Aj. Recall 
that in a pre-order traversal, we perform the visit action on each node v before we perform the 
respective visit actions for v's children. For the root, r, of T, we define A(r) = a. For any non-root 
node v, let z denote f's parent and let w denote v's sibling (and note that since T is proper, every 
node but the root has a sibling). Given A{z) and x(w), we can compute the value A{v) for v as 
follows: 

A(v) *- A{z) x ^ mod N. 

By Corollary 12 .2\ we can inductively prove that each A(v) equals the accumulation of all the values 
stored at non-descendents of v. Since a pre-order traversal of T takes 0{n) time, and each visit 
action can be performed with O(l) exponentiations, we can compute all the Ai witnesses with 
0{n) exponentiations. Note that implementing this algorithm requires knowledge of the value 
<f>(N), which presumably only the source knows. Thus, this computation can only be performed at 
the source, who must transmit the updated Ai values to the directory. 

The performance of the precomputed accumulation scheme is summarized in Table El 
In the next section, we show how to combine this approach with the straightforward approach 
of Section 12.81 to design a scheme that is efficient for both updates and queries. 
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space 


insertion time 


deletion time 


update info 


query time 


query info 


verify time 


0(n) 


0(n) 


0(n) 


0(n) 


0(1) 


0(1) 


0(1) 



Table 2: Precomputed accumulation scheme for implementing an authenticated dictionary with an 
RSA accumulator. Each of these running times count modular exponentiations and other arithmetic 
operations as constant-time operations; hence, they can also be viewed in terms of alternative 
bounds by using the above bounds as characterizing the number of such modular operations. 

4 Parameterized Accumulations 

Consider again the problem of designing an accumulator-based authenticated dictionary for a set 

5 = {ei, e2, • • • , e n }. In this section, we show how to balance the processing between the source 
and the directory, depending on their relative computational power. The main idea is to choose an 
integer parameter 1 < p < n and partition the set S into p groups of roughly n/p elements each, 
performing the straightforward approach inside each group and the precomputed accumulations 
approach among the groups (see Figure [2]). The details are as follows. 
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X^ X^ 
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Figure 2: Parameterized accumulations scheme. 



4.1 Subdividing the Dictionary 

Divide the set S into p groups, Y\,Y2, ■ ■ ■ , Y p , of roughly n/p elements each, balancing the size of 
the groups as much as possible. For group Yj, let yj denote the product of the representatives of 
the elements in Yj modulo (j>(N). Define Bj as 

Bj = tt»-»i-iW+i™»p mod N. 

That is, Bj is the accumulation of representatives of all the elements that are not in the set Yj. 
After any insertion or deletion in a set Yj, the source can compute a new value y,- with 0(n/p) 
exponentiations. (We show in Section 14.21 how with some effort this bound can be improved to 
0(log(n/p)) exponentiations).) Moreover, since the source knows the value of 4>(N), it can update 
all the Bj values after such an update in 0(p) time. Thus, the source can process an update 
operation in 0(p + n/p) time, assuming that the update does not require redistributing elements 
among the groups and we are counting modular exponentiations as constant-time operations. 

Maintaining the size of each set Yj is not a major overhead. We need only keep the invariant 
that each Yj has at least \n/p~}/2 elements at most 2\n/p~\ elements. If a Yj set becomes too small, 
then we either merge it with one of its adjacent sets Yj—i or ij+i, or (if merging Yj with such a 
sets would cause an overflow) we "borrow" some of the elements from an adjacent set to bring the 
size of Yj to at least 3[~n/p]/4. Likewise, if a Yj set grows too large, then we simply split it in two. 
These simple adjustments take 0(n/p) time, and will maintain the invariant that each Yj is of size 
@(n/p). Of course, this assumes that the value of n does not change significantly as we insert and 
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remove elements. But even this condition is easily handled. Specifically, we can maintain the sizes 
of the Yj's in a priority queue that keeps track of the smallest and largest Yj sets. Whenever we 
increase n by an insertion, we can check the priority queue to see if the smallest set now must do 
some merging or borrowing to keep from growing too small. Likewise, whenever we decrease n by 
a deletion, we can check the priority queue to see if the largest set now must split. An inductive 
argument shows that this approach keeps the size of the groups to be Q(n/p). 

Keeping the Yj's to have exactly size 0(n/p) is admittedly an extra overhead. In practice, 
however, all this overhead can probably be ignored, as it is likely that the Yj's will grow and 
shrink at more or less the same rate. Indeed, even if the updates are non-uniform, we can afford 
to completely redistribute the elements in all the Yj's as often as every 0(mm{p,n/p}) updates, 
amortizing the 0(n) cost for this redistribution to the previous set of updates that occurred since 
the last redistribution. 

Turning to the task at a directory, then, we recall that a directory receives all p of the Bj values 
after an update occurs. Thus, a directory can perform its part of an update computation in 0(p) 
time. It validates that some e, is in e by first determining the group Yj containing a, which can 
be done by table look-up. Then, it computes Aj as 

Al ^ B^ meY ^ }Xm mod N, 

where x m is the representative of e m . Thus, a directory can answer a query with 0{n/p) exponen- 
tiations. 

The performance of the parameterized accumulation algorithm is summarized in Table [3l 
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insertion time 


deletion time 


update info 


query time 


query info 


verify time 


0(n) 


0(p + n/p) 


0(p + n/p) 


0( P ) 


0(n/p) 


0(1) 


0(1) 



Table 3: Parameterized accumulations scheme for implementing an authenticated dictionary using 
an RSA accumulator. We denote with p an integer such that 1 < p < n. Each of these running times 
count modular exponentiations and other arithmetic operations as constant-time operations; hence, 
they can also be viewed in terms of alternative bounds by using the above bounds as characterizing 
the number of such modular operations. 

The parameter p allows us to balance the work between the source and the directories, and also 
between updates and queries. For example, we can set p = \y/n\ , which gives 0(y/n) time for both 
queries and updates. Note that for reasonable values of n, say for n between 10, 000 and 1, 000, 000, 
y/n is between 100 and 1, 000. In many cases, this is enough of a reduction to make the dynamic 
RSA accumulator practical for the source and directories, while still keeping the user computation 
to be one exponentiation and one signature verification. Indeed, these user computations are simple 
enough to even be embedded in a smart card, a PDA, or mobile phone. 

4.2 Improving the Update Time for the Source 

In this section, we show how the source can further improve the performance of an update operation 
in the parameterized scheme. Recall that in this scheme the set S is partitioned into p subsets, 
Yx, Y2, ■ ■ ■ , Y p , and the source maintains for each Yj a value Bj, on behalf of the directories, that is 
the accumulation of all the values not in Yj. Also recall that, for each group Yj, we let yj denote the 
product of the items in Yj modulo (f>(N). In the algorithm described above, the source recomputes 
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Uj from scratch after any update occurs, which takes 0{n/p) exponentiations. We will now describe 
how this computation can be done with 0(log(n/p)) exponentiations. 

The method is for the source to store the elements of each Yj in a balanced binary search tree. 
For each internal node w in Tj, the source maintains the value y(w), which is the product of the 
representatives of all the items stored at descendents of w, modulo 4>{N). Thus, y{r(Tj)) = yj, 
where r(Tj) denotes the root of Tj. Any insertion or deletion will affect only O (log (n/p)) nodes w 
in Tj, for which we can recompute their x(w) values in 0(log(n/p)) total time. Therefore, after any 
update, the source can recompute a yj value in 0(log(n/p)) time, assuming that the size of the Yj's 
does not violate the size invariant. Still, if the size of Yj after an update violates the size invariant, 
we can easily adjust it by performing appropriate splits and joins on the trees representing Yj, 
Yj-i, and/or Yj + \. Moreover, we can rebuild the entire set of trees after every 0{n/p) updates, to 
keep the sizes of the Yj sets to be 0(n/p), with the cost for this periodic adjustment (which will 
probably not even be necessary in practice for most applications) being amortized over the previous 
updates. The performance of the resulting scheme is summarized in Tabled! 
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update info 
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O(n) 


0(p + log(n/p)) 


0(p + \og(n/p)) 


0( P ) 


0(n/p) 


0(1) 


0(1) 



Table 4: Enhanced parameterized scheme for implementing an authenticated dictionary using an 
RSA accumulator. We denote with p an integer such that 1 < p < n. Each of these running times 
count modular exponentiations and other arithmetic operations as constant-time operations; hence, 
they can also be viewed in terms of alternative bounds by using the above bounds as characterizing 
the number of such modular operations. 

In this version of our scheme, we can achieve a complete tradeoff between the cost of updates 
at the source and queries at the directories. Tuning the parameter p over time, therefore, could 
yield the optimal balance between the relative computational powers of the source and directories. 
It could also be used to balance between the number of queries and updates in the time intervals. 

Theorem 4.1. The parameterized accumulations scheme for implementing an authenticated dic- 
tionary over a set of size n uses data structures with 0(n) space at the source and directories and 
has the following performance, for a given parameter p such that 1 < p < n: 

• the insertion and deletion operations for the source each require 0(p + \og(n/p)) exponentia- 
tions; 

• the update authentication information has size 0(p); 

• answering a query by a directory requires 0(njp) exponentiations; 

• the query authentication information has size 0(1); and 

• the verification for a user requires only 0(1) exponentiations. 

Thus, for p = -y/n, one can balance insertion time, deletion time, update authentication infor- 
mation size, and query time to achieve an 0(y/n) bound, while keeping the query authentication 
information size and the verification time constant. 

The parameterized accumulations scheme described in this section significantly improves the 
overhead at the source and directories for using an RSA accumulator to solve the authenticated 
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dictionary problem. Moreover, this improvement was achieved without any modification to the 
client from the original straightforward application of the RSA accumulator described in Section [2^81 
In the next section, we show that if we are allowed to slightly modify the computation at the 
client, we can further improve performance at the source and directory while still implementing a 
size-oblivious scheme 

5 Hierarchical Accumulations 

In this section, we describe a hierarchical accumulation scheme for implementing an authenticated 
dictionary on a set S with n elements. In this scheme, the verification algorithm consists of 
performing a series of c exponentiations, where c is a fixed constant for the scheme (see Figure [3|. 
Note that the approach of Section 0] assumed that c = l(not counting the exponentiation needed 
to verify the source's digital signature of the pair (A,t) if RSA signature scheme is used). 




Figure 3: Hierarchical accumulations scheme with parameter c = 2. 

Given a fixed constant c, we define p = n l /^- c+l ^ and construct the following hierarchical partition 
of S: 

• We begin by partitioning set S into p subsets of n\ = n c /( c+1 ) elements each, called level-1 
subsets. 

• For i = 1, . . . , c — 1, we partition each level-i subset into p subsets of n( c-4 )/( c+1 ) elements 
each, called level-(i + 1) subsets. 

Also, we conventionally say that S is the level-0 subset. 

Next, we associate a value a(Y) to each subset Y of the above partition, as follows: 

• The value of a level-c subset is the accumulation of the representatives of its elements. 

• For i = 0, . . . , c— 1, the value of a level- i subset is the accumulation of the representatives of 
the values of its level-(i + 1) subsets. 

Finally, we store with each level-i subset Y a data structure based on the precomputed accu- 
mulations scheme of Section [3] that stores and validates membership in the set S(Y) of the values 
of the level- (i + 1) subsets of Y. 

Let e be an element of S. To prove the containment of e in S, the directory determines, for i = 
1, . . . , c, the level-i subset Yi containing e and returns the sequence of values a(Y c ), a(Y c _i), . . . , a(Yo) 
plus witnesses for the following c + 1 memberships: 

• e G Y c 

• 0(15)6 5(^1) fart = C) ...,l 
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The user can verify each of the above memberships by means of an exponentiation. Thus, the 
verification time and query authentication information are proportional to c, i.e., they are 0(1). 
The performance of the hierarchical accumulations scheme is summarized in Table [5) 
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Table 5: Hierarchical accumulations scheme for implementing an authenticated dictionary with an 
RSA accumulator, where l/(c + 1) < e < 1 is a fixed constant (there is a constant factor of 1/e 
"hiding" behind each of the big-ohs in this table). Each of these running times count modular 
exponentiations and other arithmetic operations as constant-time operations; hence, they can also 
be viewed in terms of alternative bounds by using the above bounds as characterizing the number 
of such modular operations. 

The hierarchical accumulations scheme is likely to outperform in practice the parameterized 
accumulations scheme only for large-scale authenticated dictionaries (say, containing billions of 
entries), where the difference between n l /^ c+l ^ and n 1 / 2 is significant and offsets the added compli- 
cation of changing the client code and introducing the (c + l)-level accumulation hierarchy. 

Theorem 5.1. The hierarchical accumulations scheme for implementing an authenticated dictio- 
nary over a set of size n uses uses data structures with 0(n) space at the source and directories 
and has the following performance, for a given constant e such that < e < 1; 

• the insertion and deletion operations for the source each can be done with 0(n e )) exponenti- 
ations; 

• the update authentication information has size 0(n e ); 

• a query at a directory can be answered with 0(n e ) exponentiations; 

• the query authentication information has size O(l); and 

• the verification for a user requires only 0(1) exponentiations. 

We can extend the hierarchical accumulations scheme by using a more general hierarchical 
partitioning of the set S while keeping constant the size of the query authentication information as 
well as the number of exponentiations for verification. The two extreme partitioning strategies are: 
(i) single-level partition in 0(1) groups of size 0(n), and (ii) 0(logn)-level partition where the size 
of each partition is 0(1) (this corresponds to a hierarchy that can be mapped into a bounded-degree 
tree). The insertion and deletion times and the update authentication information size are then 
proportional to 0(Y^iZi 9i)i where gi is the size of the partition at the i-th level, and can range 
from 0(1) to 0(n). At the same time, the query time is proportional to 0(c + g c -i), and can range 
from 0(n) to 0(1). The number of precomputed values that need to be stored affects the amount 
of space needed per element, which varies from 0(1) to 0(n). Thus, the space increase per element 
is O(l), or to be more precise, at most two precomputed values can be stored per element in the 
dictionary (if the underlying hierarchy tree is binary) . 



6 Security 



We now show that an adversarial directory cannot forge a proof of membership for an element that 
is not in S. Our proof follows a closely related constructions given in [13 . 123 . 41]. An important 
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property of the scheme comes from representing the elements e of set S with prime numbers. If the 
accumulator scheme was used without this stage, the scheme would be insecure. An adversarial 
directory could forge the proof of membership for all the divisors of elements whose proofs it has 
seen. 

Theorem 6.1. In the dynamic accumulator schemes for authenticated dictionaries defined in the 
previous sections, under the strong RSA assumption, a directory whose resources are polynomially 
bounded can produce a proof of membership only for the elements that are in the dictionary. 

Proof. Our proof is based on related proofs given in |41I ). Assume an adversarial directory 

D has seen proofs of membership for all the elements ei, e2, • • ■ e n of the dictionary S. The trusted 
source has computed representatives x\, X2, ■ ■ ■ , x n as suitable primes defined in Section 12.61 The 
witnesses A±, A% . . . , An have been computed as well, either solely by the trusted source, or by bal- 
ancing the work between the trusted source and the directories. The trusted source has distributed 
a signed pair (A,t). By the definition of the scheme in Section [2.61 for all 1 < i < n, we have 

• Xi is the prime representative of e{ E S, i.e., h(xi) = e^; 

• v^fc < Xi < 2 3k ; 

• Af mod N = A. 

We need to show that directory D cannot prove the membership of an element e n +i that is not 
in the set S already. The proof is by contradiction. Suppose that D has has found a triplet 
(e n -j-i,a; n +i,.A n +i) proving the membership of e n+ \. Then, the following must hold and can checked 
by the user (it is not necessary for x n+ \ to be a prime): 

• h(x n+ i) = e n+ i; 

• < x n+1 < 2 3k ; 

• A^ 1 mod N = A. 

Let d = gcd(x n+ i,xix 2 ...x n ). Thus, we have gcd(^, Sis^n. ) = 1. Define / = ^tl. There 
are integers u, v such that V x ^ x ^- Xn -\-uf = 1 holds over integers. Directory D can find u and v in 
polynomial time using the extended Euclidean algorithm. Set s = A^ +1 a u . We have 



Thus, directory D can find in polynomial time a value s that is an /-th root of a. By the strong 
RSA assumption (Section 12. 7p . it must be that / = 1. Hence, we have x n +i = d and it follows that 
x n+ i divides x\Xi ■ ■ ■ x n . But by our assumptions we have x n +i < 2 3A: and X{ > V2 3k for each i, 
which implies that x n +i = Xj, for some 1 < i < n. Thus, element e n +i is already in set S, which is 
a contradiction. 

We conclude that the adversarial directory D can find membership proofs only for those elements 
already in S. □ 



7 Experimental Results 

In this section, we present a preliminary experimental study on the performance of the dynamic 
accumulator schemes for authenticated dictionaries described in this paper. The main results of this 
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study are summarized in the the charts of Figures HHS where the x-axis represents the size of the 
dictionary (number of elements) and the y-axis represents the average time of the given operation 
in microseconds. We denote with (/i(n), f2( n ), fc( n )) a generalized hierarchical partition scheme 
of the dictionary with 0(/j(n)) elements in the i-th level group (Section [5]). 

The dynamic accumulator scheme has been implemented in Java and the experiments have been 
conducted on an AMD Athlon XP 1700+ 1.47GHz, 512MB running Linux. The items stored in 
the dictionary and the query values are randomly generated 165-bit integers and the parameter N 
of the RSA accumulator is a 200-bit integer. The variance between different runs of the query and 
deletion operations was found to consistently small so only a few runs were done for each dictionary 
size considered. 

The main performance bottleneck of the scheme was found to be the computation of prime 
representatives for the elements. In our experiments, finding a prime representative of a 165-bit 
integer using the standard approach of Section 12.61 takes about 45 milliseconds and dominates the 
rest of the insertion time. The computation of prime representatives is a constant overhead that 
does not depend on the number of elements and has been omitted in the rest of the analysis. 

Figure 0] illustrates two performance tradeoffs. Part (a) compares the performance of the two 
extreme naive approaches where either the source or the directory does essentially all the work. 
Since the source can use modular multiplication and the directory has to use modular exponenti- 
ation, it is more effective to shift as much as possible the insertion work to the source. Part (b) 
shows the benefits of partitioning, which allows to reduce the computation time at the source. 

Experimental results on the hierarchical accumulations method (Section [5]) are presented in 
Figure [5j These results show that one can tune the partitioning scheme according to the processing 
power available at the source and the directory. Thus, this experimental analysis shows that the 
2-level (n 1//2 , n 1 / 4 ) partitioning scheme is superior to the (n 2 / 3 , ra 1 / 3 ) partitioning scheme, and both 
are much better than unpartitioned schemes (which would have linear performance). 
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Figure 4: Performance tradeoffs for dynamic accumulators. The insertion time at the source 
excludes the computation of the prime representative. Note that we use a logarithmic scale for the 
y-axis. (a) Query time at the directory (stars) when the directory computes the witness from scratch 
for each query (using modular exponentiations) vs. insertion time at the source (diamonds) when 
the source precomputes all the exponents of the witnesses (using modular multiplications), (b) 
Insertion time at the source (diamonds) without partitioning, when all the n witnesses's exponents 
are precomputed, vs. with partitioning (stars), when a 2-level (n 1 / 2 , n 1 / 4 ) partitioning scheme is 
used and 0(n 1//2 ) witnesses's exponents are precomputed. Times are given in microseconds. 
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Figure 5: Insertion and deletion times at the source and query time at the directory for two 
variants of the hierarchical accumulations approach (Section [5]) on dictionaries with up to one 
million elements. The time for computing the prime representative of an element has been omitted 
from the insertion time. The stars represent a 2-level (n 1//2 , n 1//4 ) partitioning scheme and the 
diamonds represent a 2-level (n 2 / 3 , n 1 / 3 ) partitioning scheme. 




8 Discussion and Conclusion 



We have shown how to make the RSA accumulator function the basis for a practical and efficient 
scheme for authenticated dictionaries, which relies on reasonable cryptographic assumptions similar 
to those that justify RSA encryption. A distinctive advantage of our approach is that the validation 
of a query result performed by the user takes constant time and requires computations (a single 
exponentiation and digital signature verification) simple enough to be performed in devices with 
very limited computing power, such as a smart card or a mobile phone. 

An important aspect of our scheme is that it is dynamic and distributed, thus supporting 
efficient updates and balancing the work between the source and the directories. A first variation 
of our scheme achieves a complete tradeoff between the cost of updates at the source and of queries 
at the directories, with updates taking 0{p + log(n/p)) time and queries taking 0(n/p) time, for 
any fixed integer parameter 1 < p < n. For example, we can achieve 0{y/n) time for both updates 
and queries. A second variation of our scheme, suitable for large data sets, achieves 0(n e )-time 
performance for updates and queries, while keeping 0(1) verification time, where e > is any fixed 
constant. 
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Our scheme can be easily adapted to contexts, such as certificate revocation queries, where one 
needs to also validate that an item e is not in the set S. In this case, we use the standard method 
of storing in the dictionary not the items themselves, but instead the ranges v% = [ej,e$+i] in a 
sorted list of the elements of S (see, e.g., Kocher [3o|). A query for an element e returns a range 
ri = [ei,ei + \] such that ei < e < e^+i plus a cryptographic validation of range rj. If e is one of 
the endpoints of r*, then e in S; else (e^ < e < ej+i), e is not in S. Note that this approach also 
requires that we have a way of representing some notion of — oo and +00. Even so, the overhead 
adds only a constant factor to all the running times for updates, queries, and validations. 
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